Automatic Script and Type Identification in Bi-lingual Forms

نویسندگان

  • Afef Kacem Echi
  • Asma Saïdani
چکیده

In this paper we have developed a system that can automatically discriminate between machine-printed and handwritten words in structured bi-lingual (Arabic and French) form document layout. Our system has been applied in the context of Tunisian National Health Insurance Fund for medical care costs refund with encouraging results. In the used forms, handwritten data usually touch or cross the preprinted form frames and texts, creating complex problems for the recognition routines. Each text type should also be processed using different methods in order to optimize the recognition accuracy. This work aims to address these issues and to especially solve the problem of machine-printed/handwritten and Arabic/French word discrimination. To this end, we computed co-occurrence matrix of oriented gradients from word’s image and used it as input to a k-Nearest Neighbor classifier. Experiments are carried on 20 forms. An average script identification rate of 98.31% is achieved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Script Identification of Text Words from a Tri Lingual Document Using Voting Technique

In a multi script environment, majority of the documents may contain text information printed in more than one script/language forms. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this context, this paper proposes to develop a model to identify and separate text words of Kannada, H...

متن کامل

A Script Recognizer Independent Bi-lingual Character Recognition System for Printed English and Kannada Documents

Department of Computer Science Amrita Vishwa Vidyapeetham, Mysore Campus Bogadi, Mysore INDIA _____________________________________________________________________________________ Abstract: Recognition of text document images is the inclination of any optical character recognition systems. This paper aims at extending the functionality of optical character recognition system to recognize more t...

متن کامل

Handwritten Script Identification from a Bi-Script Document at Line Level using Gabor Filters

In a country like India where more number of scripts are in use, automatic identification of printed and handwritten script facilitates many important applications including sorting of document images and searching online archives of document images. In this paper, a Gabor feature based approach is presented to identify different Indian scripts from handwritten document images. Eight popular In...

متن کامل

Script Identification from Bilingual Gujarati-English Documents

In a multi-lingual country like India, in most of the official papers, school text books, magazines, it is observed that English words intersperse within the Indian regional languages. So a bilingual Optical Character Recognition (OCR) system is needed which can recognize these bilingual documents and store it for future use. In this paper authors present an OCR system developed for the script ...

متن کامل

Handwritten Script Recognition Using DCT, Gabor Filter and Wavelet Features at Line Level

In a country like India where more number of scripts are in use, automatic identification of printed and handwritten script facilitates many important applications including sorting of document images and searching online archives of document images. In this paper, a multiple feature based approach is presented to identify the script type of the collection of handwritten documents. Eight popula...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016